Assessing Trends in World News
Analyzing Reddit Data Using Pyspark-AWS Framework
Overview
- Summary of Results
- Introduction to the Data
- Natural Language Processing
- Machine Learning
- AWS Infrastructure and Methodology
Key Takeaways
- r/worldnews subreddit
- Primarily Western Viewpoint
- Gaps in News Coverage (ACLED)
- Russia-Ukraine Conflict dominate the Topic Space
- Spacial granularity in NER
- Plurality of Negative Sentiment
Introduction to Data
- Subreddit: Assumed Neutrality, Popularity
- User Activity: 27000 Distinct Posters, 1.2 million Distinct Commentors
- Live Threads: Daily Coverage of Conflict
- Surge in Submissions/Comments at the Onset of War
- Russia-Ukraine Conflict: 27% Posts, 45% Comments
Introduction to Data (Continued..)
- Western inclination challenges assumed neutrality
- Divergent Pattern of User Behaviour in Social Media Sites
- ACLED Aggregated Conflict Events demonstrate gaps in news coverage
Natural Language Processing Results
Topic Modeling :
- Russia-Ukraine War Topics Dominate
- Facets of Conflict
Named Entity Recognition (NER) :
- NER reinforces the prevalance of War Posts
- Location Entities widely used
War Posts Frequent in the Topic Space
![]()
Location Based Entities Dominate the Posts
Machine Learning Results
Sentiment Analysis:
- 4 models : 3 pretrained and 1 lexicon
- VADER assumed to be most accurate
- Vivek Model for Submissions :
Predominantly Negative Sentiments Across Models
AWS Infrastructure and Methodologies Employed
AWS Pipeline
![]()